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Abstract 

This research creates a general class of perturbation models which are described by an 
underlying null model that accounts for most of the structure in data and a perturbation 
that accounts for possible small localized departures. The perturbation models encom- 
pass finite mixture models and spatial scan process. In this article, (1) we propose a 
new test statistic to detect the presence of perturbation, including the case where the 
null model contains a set of nuisance parameters, and show that it is equivalent to the 
likelihood ratio test; (2) we establish that the asymptotic distribution of the test statis- 
tic is equivalent to the supremum of a Gaussian random field over a high- dimensional 
manifold (e.g., curve, surface etc.) with boundaries and singularities; (3) we derive a 
technique for approximating the quantiles of the test statistic using the Hotelling-Weyl- 
Naiman volume- of-tube formula; and (4) we solve the long-pending problem of testing 
for the order of a mixture model; in particular, derive the asymptotic null distribution 
for a general family of mixture models including the multivariate mixtures. The infer- 
ential theory developed in this article is applicable for a class of non-regular statistical 
problems involving loss of identifiability or when some of the parameters are on the 
boundary of the parametric space. 
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1 Introduction and Motivation 



A fundamental and yet a very challenging problem in finite mixtures is determining the 
order of a mixture model or mixture complexity. This problem has been under intense 
investigation for over thirty years (Wolfe, 1971; Roeder, 1990; Lindsay, 1995) with no 
practically feasible solution for a general class of mixture families. Establishing a valid 
large-sample theoretical framework along with a practically feasible machinery for testing 
the order of a mixture model formed from a broad class of densities remains an open 
problem and is the focus of this research. It has long been noted that testing for the 
number of mixture components is a non-regular problem (a) due to loss of identifiability 
of the null distribution (i.e., the parameters representing the null distribution are not 
unique) and (b) since the parameters under the null hypothesis are on the boundary 
of the parameter space, instead of its interior. Consequently, the likelihood ratio test 
(LRT) statistic does not have the standard asymptotic null distribution of chi-squared 
(Chernoff, 1954; Ghosh and Sen, 1985; Hartigan, 1985; Bickel and Chernoff, 1993). As 
noted by several authors, the asymptotic null distribution of the LRT statistic is highly 
complex and very difficult to simulate from in practice. 

The main thrust of this research is to create a fundamental class of models referred to 
as perturbation models and derive large-sample theory to detect the presence of pertur- 
bation. These models play an instrumental role in the development of inferential theory 
for a class of important problems such as (1) testing for the order of a mixture model 
formed from smooth families of densities, including the multivariate case; (2) searching 
for an unusual activity or region in the context of spatial scan process; and (3) detecting 
a signal in the presence of noisy backgrounds (Pilla et al., 2005). The resulting theory 
has broad apphcations in astronomy, astrophysics, biology, medicine, particle physics 
and datamining, to name a few. 

1.1 Perturbation Models 

Let V = {p{x; 77, A, 0) : A G A, G C TZ'^} be a family of probability density functions. 
Assume that X = {Xi, . . . ,X„) is an independently and identically distributed (i.i.d.) 
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random sample from 

p{x; T], A, 6) := (1 - 7]) fix; A) + r/ ^(x; 6), (1.1) 

where /(■; A) is a null density for an unknown parameter vector A G A, 0) is a 
perturbation density with an unknown nuisance parameter vector G C TZ'^, both 
defined on a sample space X C 7^* and 77 G [0, 1] is the size of the perturbation. In the 
context of finite mixture models, the null model represents a mixture with m component 
densities and the perturbation model represents additional component densities. In the 
spatial scan process scenario, the null density accounts for the background or noise 
whereas the perturbation searches for an unusual activity. 

The central idea is to introduce a perturbation parameter rj which creates a departure 
from the null model. There are two primary goals: (1) Estimation of the parameters in 
the perturbation model and (2) testing the hypothesis 

Ho:ri = against TYi : 77 > 0. (1.2) 

Under Hq, p{-; 77, A, 0) = /(•; A) and the null model entirely describes the data. However, 
under Tii, the term r/ ■?/'(■; 6) represents a departure from the null model. 

The perturbation model falls into a class of problems studied by Davies (1977, 1987) 
in which a vector of nuisance parameters (in our case 0) appears only under the alterna- 
tive hypothesis and standard asymptotic theory for the LRT breaks down. In particular, 
the asymptotic behavior of the LRT for the testing problem (1.2) is very difficult to char- 
acterize due to the difficulties with the geometry of the parameter space (scenarios (a) 
and (b) discussed earlier). It is worth noting that these same set of problems occur in 
the context of testing for homogeneity in finite mixture models. The inferential theory 
developed in this article requires only mild smoothness conditions on the family of densi- 
ties while being generic and applicable much more widely. The two most important and 
distinct statistical problems motivating this work are finite mixture models (Lindsay, 
1995) and spatial scan analysis (Glaz et al., 2001). 

1.2 Inference in Mixture Models 

Let J-" = {tp{x] 6) : ^ @ G TZ'^} be a family of probability densities with respect to a 
(T-finite dominating measure /i for an s-dimensional random vector x G X G TZ^ and let 
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Q be the space of all probability measures on with the cr-field generated by its Borel 
subsets. Assume that the component density ?/'(■; 0) is bounded in G 0. 

Suppose that given 6, a random variable X has a density ip{x; 6) and that fol- 
lows a distribution Q, referred to as mixing distribution. For a given Q ^ Q, as- 
sume that the sample arises from the marginal density g(x; Q) := jgtlj{x;6) d Q{6) 
for X ^ X C referred to as a mixture density with a corresponding mixing measure 
Q. In the case of a discrete and finitely supported mixing measure, the mixing distri- 
bution can be expressed as Qm = Yl^=i(^j ^i^j)^ where e{-) is a point mass function 
and 6i, . . . ,0m are distinct support point vectors with a corresponding vector of mix- 
ing weights (3 := . . . , Pm)'^ such that f3 belongs to the interior of the unit simplex 
{/3 : Sjli/^j — I'/^j — O'i — 1; • • • j''^}- Therefore, mixture density can be expressed 
as g(x; Qm) = J2f=i Pj i^i^i ^j)^ where the number of support points m becomes the or- 
der of the mixture model or mixture complexity. The probability distribution Qm that 
maximizes the loglikelihood /(Qm) = Yl^=i^'^S[s{^u Qm)] is the nonparametric maxi- 
mum likelihood estimator (NPMLE) of Qm (Lindsay, 1995). 

A long-pending and very challenging problem is determining the order m of the mix- 
ture model. In the perturbation model framework, if /(; A) represents the m-component 
mixture density g(-; Qm), then ip{-;0m+i) represents the (m + l)st component density. 
Therefore, inferential theory for perturbation models provides the machinery for testing 
the order of a mixture model. If m is fixed, the loglikelihood has multiple local max- 
ima and the LRT has an unknown limiting distribution. In the case of normal mean 
mixtures and under severe identifiability conditions, Ghosh and Sen (1985) derived the 
asymptotic null distribution of the LRT as 

snp[Z{0)fl[Z{0)>O], (1.3) 
eg© 

where Z{0) is a zero mean Gaussian process indexed by a set with a specified covariance 
function and ![■] is the indicator function. When the support set of certain parameters 
in the model is unbounded (e.g., in normal and gamma mixtures), the LRT statistic can 
diverge to infinity as n — > oo instead of having a limiting distribution (Hartigan, 1985; 
Liu et al., 2003). This divergence of the LRT poses major difficulties in characterizing 
the distribution of the LRT and in obtaining reliable simulation results for the null 
distribution (Lindsay, 1995). For testing in multinomial mixture models, Lindsay (1995) 
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derived approximation to the asymptotic distribution of the LRT based on the Hotelhng- 
Weyl (Hotelhng, 1939; Weyl, 1939) volume-of-tube formula. 

Existing theoretical results have been obtained only for some special cases and many 
researchers have considered simulation and resampling based approaches to approx- 
imate the asymptotic null distribution of the LRT for simple models; see Lindsay 
(1995) and McLachlan and Peel (2000) for detailed discussion and other references. 
Dacunha-Castelle and Gassiat (1999) proposed a general theory for the asymptotic null 
distribution of the LRT in testing for TYq ■ m = p mixtures against TYi : m = q mixtures, 
where q > p using a locally conic parameterization. Under certain stringent conditions, 
they showed that the asymptotic null distribution of the LRT statistic has a form similar 
to (1.3); however, tail probability calculations required for calibrating the LRT statistic 
are not derived. Unfortunately, analytic derivations of the distribution of supremum of 
the Gaussian process are difficult problems. Most importantly, the issue of "singularities 
of the process" (as described in Section 3.3) is of fundamental importance in the con- 
text of mixture testing problem and it has not been addressed in the existing literature, 
including by Dacunha-Castelle and Gassiat (1999). 

The perturbation theory developed in this article, provides an elegant and flexible 
machinery for approximating the quantiles of the test statistic for the following class of 
fundamental problems: (1) testing problems in which the true parameter is on the 
boundary of the hypotheses regions; (2) testing Tio : m-component mixture against 
Til : (m + g)-component mixture for g = 1, 2, . . . when mixtures are formed from any 
smooth families, including discrete, continuous and multivariate densities; and (3) testing 
for the presence of a signal when the probability density functions under the null and 
alternative hypotheses belong to different parametric families which occurs in physics 
applications (Pilla et al., 2005). 

1.3 Inference in Spatial Scan Statistics 

In the scan statistics problem, one observes a random fleld (such as a point process) 
in a region of interest. The goal is to detect unusual behavior in subregions, where 
the behavior of the fleld differs signiflcantly from the background. Applications include 
mammography; automatic target recognition; disease clustering and minefleld detection. 
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In the classical formulation of the scan statistic (see Glaz et al. (2001) and the ref- 
erences therein), a rectangular window is scanned across the data, with high values of 
the statistic indicating a local departure from uniformity. In contrast, the methods de- 
veloped in this article are applicable to smooth scanning processes, where the window 
is tapered, rather than having sharp boundaries. The null density /(■; A) represents the 
background model while the scan window ip{-;0) represents departure from the back- 
ground at location 6. 

1.4 Main Results 

We create a general family of models referred to as perturbation models that encompass 
a large class of statistical problems. Our treatment of the nuisance parameters under the 
null hypothesis is quite general. The inferential theory developed in this article provides 
a solution to an important class of statistical problems involving loss of identifiability 
and/or when some of the parameters are on the boundary of the parametric space. The 
main contributions of this article follows. 

1. In Section 2, we propose a novel test statistic based on the score process, denoted by 
T, for detecting the presence of perturbation and derive its fundamental properties. 
In particular, it is shown that the test statistic T based on the score process is 
asymptotically equivalent to the LRT statistic. 

2. In Section 3, we derive a general inferential theory for approximating the asymp- 
totic null distribution of T. It is shown that the asymptotic distribution of T 
under TYq equals sup^ Z{6), where Z{0) is a differentiable Gaussian random field 
with continuous sample paths. Therefore, the goal becomes finding approxima- 
tions for P(sup0 Z{0) > c) for any large c G 7^ in order to determine the quantiles 
of T. As eloquently pointed out by Adler (2000), this problem occurs in a large 
number of different applications including in image processing (Worsley, 1995). 
We describe a connection between Z{6) and a differentiable manifold (curve, sur- 
face, etc.) through the Karhunen-Loeve expansion. The Karhunen-Loeve expan- 
sion converts the high- dimensional Gaussian probability problem into that of a 
chi-squared random variable and uniformly distributed random variables over the 
surfaces of spheres (Adler, 2000). 



6 



3. Our technique is based on the long-estabhshed and elegant geometric resuh known 
as the volume- of-tube formula (HoteUing, 1939; Weyl, 1939; Naiman, 1990). The 
problem of evaluating the Gaussian random field significance probabilities (i.e., tail 
probability for the asymptotic null distribution of T) for testing the hypothesis 
(1.2) is reduced to that of determining the volume-of-tube about a manifold on 
the surface of a hypersphere (see Section 3.2). The novelty here lies in deriving 
explicit expressions for the geometric constants appearing in the volume-of-tube 
formula with boundaries; consequently, one can approximate the quantiles of the 
statistic T for detecting the presence of perturbation. We also address the difficult 
and yet important problem of presence of singularities in the score process. 

4. In Section 4, the results of Section 3 are extended to the case where the null density 
is characterized by a vector of nuisance parameters. 

5. An age old and fundamental question of determining the order of a mixture model 
is solved in Section 5. In particular, building on the perturbation theory, we de- 
velop inferential methods for approximating the quantiles of the test statistic for 
determining the mixture complexity. The fiexibility and general applicability of the 
methodology is demonstrated through univariate and multivariate mixture fami- 
lies. Furthermore, it is shown that the results of Lindsay (1995), Lin (1997) and 
Chen and Chen (2001) become special cases of our general and broadly applicable 
theory. 

The paper concludes with a discussion of the relative merits of the perturbation 
theory in Section 6. In Section 7, we derive the proofs of our general results. Explicit 
expressions for the geometric constants that appear in the volume-of-tube formula are 
derived in Appendix A. 

2 A Score Process and its Fundamental Properties 

In this section, we derive a score process and its fundamental properties that are required 
for the testing problem (1.2). As a first step, we assume that A is fixed or known so that 
/(; A) is completely specified and the density (1.1) can be expressed simply as p{; r], 6); 
however, theory for the general case of an unknown A will be derived in Section 4. 



7 



2.1 Loglikelihood Ratio Process 

If 6 is fixed at a particular value, then the testing problem (1.2) becomes routine. 
However, the nuisance parameter vector can assume any value under TYq; therefore, 
the testing problem is non-regular. The loglikelihood function based on the perturbation 
model (1.1) is l{r], 6\x.) = Yl^=i log [(1 — r]) fi^u ^) + VPi^u ^)]- For a fixed 0, l{ri, 0\x.) 
is a concave function of t] and hence there exists a unique maximizer rfe G [0,1]. In 
general, there is no closed form solution for rjg] however, the estimator can be found as 
a solution to 

if a solution in (0, 1) exists; otherwise the estimator will be at one of the end-points. 
This leads to a corresponding loglikelihood ratio process /*(0|x) = l(fig,6\x.) — /(0,0|x). 
Considered as a function of 0, the process l*{0\x.) may be used as a diagnostic tool, 
with large values indicating the presence of perturbation. The maximum likelihood 
estimator (MLE) of is the maximizer of l*{6\x.). However, maximizing this process is 
computationally intensive, since l*{0\x.) may have many local maxima. Any strategy for 
finding the global maximum has to involve an exhaustive search, which in turn requires 
solving (2.1) for each fixed 0. In the next section, we derive an alternative technique 
that will combat these difficulties. 



2.2 The Score Process: Theory 



In this section, we propose a novel technique based on a score process defined as 



S{0) := |^/(r/, 0,A|x) 



r;=0 



n 

E 

i=l 



fixi] A) 



(2.2) 



The interest is in the parameter vector and since A is fixed for now, for exposition, we 
drop A from the expressions and simply write S{0), S*{0), Z{0), etc. 

The score process has several elegant features: (1) it is not as computationally inten- 
sive as the likelihood ratio process and (2) its explicit representation makes statistical 
inference tractable. It is shown in Theorem 1 (below) that the score process has mean 
zero when there is no perturbation (i.e., rj = 0) and E[S'(0)] > when there is a per- 
turbation at = 00, the true parameter vector. This suggests that peaks in the score 
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process provide evidence for the presence of perturbation. However, S{6) can exhibit 
high random variabihty and the variance may have substantial dependence on 6. To 
combat this difficulty, we propose the normalized score process defined as 

where the covariance function is defined as 

•= J J{^) 

^ /^^MM^,,,!. (2.3) 

The covariance function C{0, 6'^) has an analytical expression for certain choices of /(■; A) 
and il){-\0) while in other cases numerical integration is required. 

The following conditions are assumed for deriving the large-sample theory. 

Al: The parameter space is a compact and a convex subset of TiJ^ for some integer d. 

A2: The covariance function satisfies C{6,6) < oo for all G 0. 

A3: For each G 0, supp ■ ;0)] C supp[/( ■ ; '^)]! where 'supp' refers to the support 
of a density. 

In the following theorem, we characterize some fundamental properties of the score 
and normalized score processes. 

Theorem 1 Suppose assumptions A2 and A3 hold: (1) Under Hq, the score process has 
mean 'K[S{9)] = for all 6 with a covariance function cav[S{6), S{6'^)] = n'C{6,0'^), 
where C{6,6^) is defined in (2.3); (2) under Hi, 

E[S{0)] = nr]C{e,eoy, (2.4) 

and (3) under Tii, the expectation of the normalized score process is 

ns\e)] = ^=== < n^Me^Jo) (2.5) 

with equality at 6 = Oq. 
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Proof. Under Tii, it follows that 



E[S{6)] 



n 



I 




1 p{x; T], 6q) dx 



nr] 



I 



dx = nrjC{0, Oq) 



which yields the result (2.4). Similarly, one can derive the mean and covariance functions 
in part 1 of the theorem. The bound (2.5) is established by noting that C{0,Oo) is a 
covariance function and therefore satisfies the Cauchy-Schwartz inequality C(0, ^o) ^ 



The motivation for using the score processes lies in part 3 of Theorem 1: The expec- 
tation of S*{0) is maximized at 6q. Therefore, the supremum of the process S*{6) can 
serve as a test statistic for the hypothesis (1.2). If Hq is rejected, then the maximizer 
of S*{6) serves as a point estimator of 0. The final result of this section establishes the 
asymptotic equivalence between the score and loglikelihood processes; the proof is given 
in Section 7. 

Theorem 2 The score process and loglikelihood ratio process are asymptotically equiv- 
alent, in the sense that l*{6\x.) = |[max{0, S*{6)}]'^ + Op(l) as n —>■ oo. 

3 Testing for the Presence of Perturbation 

We first propose a statistic for the testing problem (1.2) and next derive its asymptotic 
null distribution. From the motivation presented in the previous section, it is natural to 
define a statistic for testing the hypothesis (1.2) as 



Except in special cases, the distribution of T cannot be expressed analytically. Our next 
goal is to derive an asymptotic distribution of T under Tio for determining approximate 
quantiles of the test statistic. As a first step, we establish that under Tio the distribution 
of T is asymptotically equivalent to the distribution of the supremum of a Gaussian 
random field. Next, we derive approximations for the tail probability of the supremum 



r := sup S*{6). 



(3.1) 



0e& 
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of a Gaussian random field using the Karhunen-Loeve expansion and tlie volume-of-tube 
formula. 

The volume-of-tube problem for curves (i.e., d = 1) was first studied by Hotelling 
(1939) in the context of significance testing for nonlinear regression. In a second pioneer- 
ing paper, Weyl (1939) extended the work of Hotelling to higher-dimensional manifolds 
(i.e., d > 2), deriving elegant expressions for the volume-of-tube of manifolds lying in a 
hypersphere. Naiman (1990) further extended the Hotelling- Weyl results to cases where 
the manifold has boundaries. Sun (1993) studied higher order terms for Gaussian pro- 
cesses and fields. Important statistical problems to which the volume-of-tube formula 
has been applied include non-linear regression (Hotelling, 1939; Knowles and Siegmund, 
1989), projection pursuit (Johansen and Johnstone, 1990), testing for multinomial mix- 
ture models (Lindsay, 1995; Lin, 1997), simultaneous confidence bands [Naiman (1987), 
Sun and Loader (1994) and Chapter 9 of Loader (1999)] and inference under convex 
cone alternatives for correlated data (Pilla, 2006). 

The following assumptions are required for the development of inferential theory. 

A4: For all x & X, the perturbation density ^/;(x; 6) is a twice differentiable, while 



where / denotes differentiation with respect to 6 G &. In the multi-parameter case, 
all first and second-order partial derivatives are assumed to satisfy the integrability 
condition as well. 

A5: The covariance function C{6,6) is positive in 0; equivalently, /(■; A) is not identi- 
cally equal to tlj{-;0) for any G 0. 

The assumption A5 fails in several important problems including mixture models, 
leading to singularities in the score process. In Section 3.3, we derive modifications to 
our theory to handle this difficult but important problem. 

Let {Z{0) : G C TZ'^} be a rf- dimensional differentiable Gaussian random field 
with continuous sample paths, with mean zero and covariance function 




and 




p{0,0^) ■.= E[Z{0)Z{0^)] 



C(6>, 0^) 



(3.2) 



^C{0,0)C{0\0^) 
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Under assumptions A4 and A5, the asymptotic null distribution of T is the supremum 
of a Gaussian random field, expressed explicitly as 

z{e) = [C(6>,6l)]-i/2 
where W is the standard Brownian sheet. 

Theorem 3 Suppose that assumptions Al to A5 hold. Under T-Cq, 

P (T > c) — ^ P ( sup Z{e) > c ) as n-^oo for any c G 7^. (3.3) 

Theorem 3 will be proved in Section 7. Generally, there is no exact result for finding 
P(sup0 Z{6) > c) (Adler, 2000). The result of Theorem 3 holds even if we relax assump- 
tion A4. Our proof relies only on the assumption of first derivative of 6); however, 
the second derivative conditions are required for the explicit probability approximations 
derived later using the volume-of-tube-formula. 

The problem of approximating the distribution of the supremum of a smooth Gaus- 
sian random field (i.e., finding F^supg Z{0) > c) for large c) can be addressed using 
several different techniques: (1) methods based on the Hotelling-Weyl (Hotelling, 1939; 
Weyl, 1939) volume-of-tube formula with boundary corrections (Naiman, 1990); (2) 
expected Euler characteristic methods (Siegmund and Worsley, 1995; Worsley, 2001); 
(3) approaches based on counting the local maxima and upcrossings; and (4) Rice for- 
mula (Siegmund and Zhang, 1993; Azais and Wschebor, 2005). All these techniques 
lead to similar results for practical purposes (see Adler (2000) for discussion). Some 
formal equivalence results between the tube formula and the expected Euler character- 
istic methods have been derived by Takemura and Kuriki (2002). In this article, for the 
development of inferential theory for perturbation models, we adopt the volume-of-tube 
formula technique for its relatively simple geometric interpretation and the flexibility 
to yield explicit results for higher-order boundary corrections. The disadvantage of the 
tube approach is that it is directly applicable only to processes that are Gaussian or 
Gaussian-like (Adler, 2000). 



f{x;X) 



y^fix,X) W{dx), 
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3.1 The Karhunen-Loeve Expansion 



In this section, we construct a sequence of finite-dimensional approximation to the Gaus- 
sian random field Z{0) using the Karhunen-Loeve expansion. Although Karhunen-Loeve 
expansion is most convenient, any other uniformly convergent approximation, such as a 
cubic spline interpolant on a grid of is also applicable. 

While some of the core ideas in this section are known, there does not exist a complete 
statement of the results in the form that are required for the general testing problem 
(L2). In particular, addressing the following scenarios are of fundamental importance: 
(1) is a hyper- rectangle or a similar polygonal region with boundaries of various orders 
(edges, corners and so on) and (2) the score process S{0) has singularities. 

A concise presentation of the Karhunen-Loeve expansion can be found in Section III. 3 
of Adler (1990). The Karhunen-Loeve expansion of Z{6) is the uniformly convergent 
series expansion 

oo 

z{e) = J2^,U0) = {d,^m, (3.4) 

k=l 

where 3a; is an i.i.d. standard Gaussian random variable, {^A;(^)}fcli is a sequence of twice 
continuously differentiable functions, while 3 and ^{0) are the corresponding vector 
counterparts. The covariance function (3.2) can be explicitly expressed as 

oo 

p{e,o^) = Y.uo)Uo^) (3.5) 

fc=i 

and 3fc = /ife ' /e U^) Z{e) dd, where /i^ = ^l^^) 

It is necessary for Z{6) to have a finite Karhunen-Loeve expansion for the application 
of the volume-of-tube formula. When the expansion is infinite, the series is truncated 
at J terms to yield 



Zj{0) := J2^kUO)+3o 



k=l 



\ 



fo(e)P = (3j.?jW>, (3.6) 



k=J+l 



where 3o ~ N{0, 1) and is independent of 3i, 32, . . ., 3j = (3o, • • • , 3j-i)^ and ^j{6) is 
the corresponding truncated version of the sequence {^fc(^)}fcli- The covariance function 
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of Zj{0) can be expressed as 

k=l 



E«Ke)E = («^(e).0(9*)>- (3.7) 



k=J k=J 



The final term in (3.6) has been chosen to preserve unit variance; i.e., Y[Zj{6)] = 

pj{e,e) = i. 

3.2 Distribution of the Supremum of Z{6) 

In this section, we provide an approximation to sup^ Z{6) under a very general assump- 
tion that 7W is a manifold with a piecewise smooth boundary. This result, combined 
with Theorem 3 provides an elegant approximation to the asymptotic null distribution 
of the test statistic T. The primary goal is to approximate the asymptotic probability 
in (3.3) when c G 7^ is large, E @ C IZ'^ and d>l. 

Conditioning on the length of the vector 3j, 



p sup Zj{e) > c = p sup (3,7, iAO)) > C 

= Pfsup (^,O(0))> 



ee® \ ||3j|| / ||3j 



/ P f sup (Uj,|^(0)) > hj{y)dy, 



where the J-dimensional random vector Uj = (3o/||3j||, • • • , 3j-i/||3j||)"^ is uniformly 
distributed on the unit sphere iS*-"^"^-* embedded in T?.'^, ^{0) is a curve in S^"'~^^ and 
hj{y) is the density with J degrees of freedom. Consequently, the goal becomes 
evaluating the distribution of the supremum of a uniform process in (3.8). 

First, note that the inner product {\Jj,$,j{6)) is bounded by 1 (using the Cauchy- 
Schwarz inequality) enabling the restriction of c/^/y < 1 or < y < oo. Since ||Uj — 
^,(0)||2 = ||U,||2 + ||^,(0)||2-2(U,,O(0)) = 2[1-(U,,OW)], it follows that, for any 
WE (0,1), (Uj,£j(0)) >M;ifand only if \\lJj-^j{0)\\ < r := ^2{l-w). Therefore, 



P( sup (Uj,OW) >^ = P inf||Uj-|^(0)||<r 
^0e& / \0e@ 



P[UjGT(r,A<)] = %^, (3.9) 
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where 'd{r,A4) denotes the volume of %{r,Ai) — a tube of radius r around the manifold 
M := {^j{6) Oe&C 7^'^}, and Aj = 27r^/Vr(J/2) is the ( J - l)-dimensional volume 
of the unit sphere S^'^~^\ The last expression follows since Uj is uniformly distributed 
over 

Remark 1: Finding the distribution of the supremum of a Gaussian random field Z{0) is 
now reduced to that of determining the volume-of-tube of the manifold A^. The solution 
to this problem depends on the geometry of M.. When the set is one-dimensional (i.e., 
d = 1) and $,j{0) is continuous, then M. is a. curve on the unit sphere and the tube 
consists of a main "cylindrical" section plus the two boundary caps as shown in Fig. 1. 
In this case, results of Hotelling (1939) and Naiman (1990) yield the approximation 

/i2 ^/±l 

where kq is the length of the manifold Ai, Ba^b is the beta density with parameters a 
and h and £o = 2 is the number of end-points. Introducing allows us to treat cases 
where M. consists of two or more disconnected segments (due to singularities in the 
score process), which is a common phenomena in the context of mixture models. The 
volume-of-tube formula is exact whenever r is less than a critical radius ro (equivalently, 
wq < w < 1) which depends on the curvature of M.. 

One-dimensional 
manifold ^(6) 




Figure 1: Tube of radius r around a one-dimensional manifold (curve) with boundaries 
embedded in S^. 

Application of the volume-of-tube formula to a Gaussian random field leads to the 
main result of this section. 
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Theorem 4 Under assumptions Al to A5, the distribution of supg Z{0) for a general 
d is given by 

P f sup Z{d) >c)=J2 ixl+i-t > c') + o[c~' exp(-cV2)], (3.10) 

as c ^ oo, where At = 27r*/^/r(t/2) is the [t — 1)- dimensional volume of S^^^^^ in TV 
and (t are the geometric constants derived in Appendix A. 

Multinomial Mixture Problem: Equation (4.19) of Lindsay (1995), derived in the context 
of muhinomial mixture models, is a special case of Theorem 4 (see also Lin (1997) for 
bounds). This connection is explored further in Section 5. It is important to note that 
for multinomial mixture models, the Karhunen-Loeve expansion is finite. 

Remark 2: Although the proof of Theorem 4, derived in Section 7, uses the Karhunen- 
Loeve expansion, it is not necessary to find this expansion since one can determine 
the geometric constants appearing in (3.10) entirely from the covariance function 
C{e,0^). However, it is necessary to consider the geometry of the manifold A4 in order 
to treat the boundary corrections, particularly when d > 2. 



3.3 Singularities in the Score Process 

One of the conditions required for Theorem 4 is that C{6,6) is positive for all 6 E @. 
This condition is violated when f{-',X) = iIj{-;0) for some 0. This is a commonly 
occurring phenomena in the context of finite mixture models. Therefore, we need to 
consider more carefully the behavior of the score process near = 0q. Let S'{0) = 
dS{0)/d0 andV[S'(0)] be the variance of 5' (6>) so that ^(6>) = {0-0o)S'{0o)+o{0-0o), 
nC{0, 0) = {0-0,fN[S\0^)]+o[{0-0,f] and S*{0) = sgn{0-0,)S\0,) / ^V[S'{0o)] + 
o{0 — 0o), where 'sgn' is the sign function. In particular, this implies that the process 
"flips" and 

lim S*{0) = - lim S*{0). (3.11) 

Correspondingly, $,{0q) = — ^(^o )• In effect, the manifold A4 has two pieces and four 
boundary points. The result in Theorem 4 still holds; however, io = 4. 
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4 Nuisance Parameters under the Null Model 



In this section, we derive general theory for the case of unknown nuisance parameter 
vector A. We derive a series of fundamental results that provide a "linearization of the 
score process" (defined below) to identify the correct covariance function (see Theorem 
6 below) for this setting. We replace A by A, the MLE of A, and assume that the 
MLE satisfies the necessary regularity conditions stated by Chernoff (1954). Our goal 
is to find an appropriate normalizing factor for the score process and in turn apply the 
volume-of-tube formula for approximating the asymptotic null distribution of T. 

In the context of finite mixture models, the null density /(■; A) is equivalent to the 
mixture density g(-; Qm) representing an m-component mixture model with A = Qm 
containing a vector of support points and the corresponding mixing weights. The score 
process is searching for an (m + l)st component. 

If A is estimated via the ML method, then under Hq, the score process can be 
expressed as 



The statistic T will still be the supremum (over 0) of the normalized score process; 
however, estimating the nuisance parameter vector A means that the covariance function 
C{e, e^) defined in (2.3) is no longer appropriate for normalizing the score process. 

As a first step, it is assumed that the MLE A under Tio satisfies the required con- 
ditions for the second-order asymptotic theory (Lehmann, 1999). Hence, the following 
results hold: 



and n~^/^ 'Yll=i ^ K^o\xi) ~^ N[0, 1(Ao)] as n — > oo, where Aq is the true null parameter 
vector, indicates convergence in distribution, I(Ao) is the Fisher information matrix 
and V/(A|x) is the vector of partial derivatives of /(A|x) = log/(x; A) with respect to 



Theorem 5 Suppose that assumptions A2 to A5 hold. Under Hq with the true null 




n 



n 




(4.1) 



i=l 



A. 
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parameter vector Xq, the score process has the asymptotic representation of 

n 

Sid\X) = Si0\Xo) - C^ie\Xo)[Ii\o)]"' ^V/(Ao|x,) + 0,(^1/2), 

i=l 

where Op{n^^'^) is uniform in 6 and C(0|Ao) is the covariance vector defined as 



C{e\Xo) := cov 



fixi; Ao) 



1 , V/(Ao|xi: 



ip{x; 0) V /(Ao|x) dx. 



Proof. By expanding the score process in a Taylor series around Aq, we obtain 

d 



s{e\x) = s{e\x,) + (a - Aq)' ^s{e\x) 



dX 



where A G [Aq, A]. Direct calculation shows that 



n dX 



S{0\X) 



1 " 

-E 



Vl{X\xi 



x=x ^ ~t fi^u Ao) 
From the uniform strong law of large numbers and the fact that A Aq , it follows that 



1 d 
n oX 



cov 



fixi; Ao) 
C(0|Ao) as n — >• oo. 



1, V/(Ao|xi) 



It follows from assumption Al and the continuity of that the convergence is uniform 
in 0. Combining this result with (4.1) completes the proof. H 



Theorem 6 The process 

n 

n~'/'S{e\Xo) - n^'/'C^{e\Xo) [I{Xo)r' 5^ V/(Ao|x,) 

i=l 

has the covariance function 



(4.2) 



(4.3) 



where C{6,6'^) is defined in (2.3) with /(■; Ao) replacing /(■; A). 
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Proof. The result follows immediately from the observations that 



n ^ cov 



n ^ cov 



.1=1 
and rT^ cov 



1=1 



i=l 



I(Ao) 



C(0|Ao 



A6: Suppose C*(0, 0^) is continuous and < C{0, 0) < oo for all 6 E @. 
Theorem 7 Under assumptions A2 through A6, 

s{e\x) 



sup 



-w sup Z*{6) as n — i> oo, 



where Z*[0) is a Gaussian random field with the covariance function 



P 



(0,6^) 



[6,6) C*{6\e^] 



Proof. First, the result holds for the process (4.2) (which is similar to Theorem 3). Next, 
the result follows from Theorem 6. ■ 

We apply the results of Theorem 4 to the case of one-dimensional 0: 

Theorem 8 The tail probability is expressed as P(sup0gg, Z{0) > c) = /to/(27r) W{x2 ^ 
c^) + (4/4) F{xj > c^) + o[c~^ exp(-cV2)] with 







de dd 



1/2 



dO 



and io = 2. 



The covariance function and kq depend on Aq; hence, cannot be evaluated directly. 
However, replacing Aq by A yields a consistent estimator for Aq. Just as in the case of a 
fixed A, the condition C*{6, 6) > for all G (part of assumption A6) will be violated 
in the context of finite mixture models. However, one cannot handle the singularities in 
a nice fashion and they are best treated on a case-by-case basis. In particular, (1) there 
may be multiple singularities, corresponding to each component of the mixture model 
under Hq and (2) in some cases the singularities lead to discontinuities (as described 
earlier) while in other cases the singularities are removable. 
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5 Testing for the Order of a Mixture Model 



In this section, building on the perturbation theory, we derive results for the long- 
pending problem of testing for the order of a mixture model while achieving the following 
goals: (1) Demonstrating how the existing results for a special class of mixtures can be 
derived from our general theory, (2) obtaining explicit and flexible expressions for the 
geometric constants in the asymptotic tail probability and (3) a careful examination of 
the singularities of the score process that routinely occur in mixture models. 



5.1 Mixtures of Binomial Distributions 

Discrete mixtures for a random variable X assuming a finite set of values (e.g., 0, . . . , 6) 
are of special interest, since the data can be summarized by the bin counts A'o, . . . , Nh. 
The loglikelihood and the score process S{6) depend on the data only through these 
values. After appropriate centering and scaling, it is easy to verify that the bin counts 
have an asymptotic 6-variate multivariate normal distribution. Consequently, the score 
process S{6) must have a finite Karhunen-Loeve expansion. 

Consider the case of 6 = 2 and a mixture of Binomial(2, 6) distributions with 6 G 
[0, 1]. That is, our interest is in testing Ho : r] = against Hi : r] > and ip{x,6) is 
assumed to have a Binomial(2, 9) distribution expressed as 



{i-ef if x = o 

2^(1-^) if x = l 

a x = 2 



with the null density A) for some A G [0, 1]. Therefore, the perturbation model can 
be expressed as p{x; 77, A, 9) = {1 — rf) ip{x, A) + 77 'ipi^x, 6). 

Case 1: Assume A is known and 9 is unknown. The score process 

Since Ni = {n — Nq — N2), the score process reduces to 

n ^ S{9)=Zo _ ^^^^ + Z2 ^2 _ = ^o[0) Zo + C2{9) Z2, 
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where Zq = n-^/^[No - n (l - X"^)] and Z2 = n~^/^{N2 - n X'^). The vector [co(^), C2ie)f 
traces a smooth curve through the origin at 6* = A. The normahzed score process S*{6) 
has the flip property discussed earher. 

The random variables Zq and Z2 are correlated; hence, explicit representation of S{9) 
in terms of the uncorrelated random variables is quite messy. However, the corresponding 
manifold Ai consists of two arcs on the unit circle and the one-dimensional volume of the 
tube is Ko = cos~-'^(ro) + cos~-'^(ri), where to = cor[S'(0), — S"(A)], ti = cor[5'(l), 5"(A)] 
and £0 = 4. Note that to and ti can be evaluated explicitly based on Y{Zo) = (1 — 
A)2 A (2 - A), cov(Zo, Z2) = -A2 (1 - A)^ and ¥(^2) = A^ (1 - A) (1 + A). After some 
algebra, it is easy to verify that to = a/2A/(1 + A) and Xi = a/2(1 — A)/(2 — A). Since 
Ai consists of two arcs on a unit circle, the exact asymptotic null distribution of T is 
obtained using the method of Uusipaikka (1983). 

Case 2: Assume that both A and 6 are unknown. Consider the MLE of A, A = (A'^i + 
2N2)/{2n) = {n + N2-No)/{2n), so that Zq = Z2 = {Nq + N2) /2-n/ A-{N2- N^f / {An) 
and S{6\X) = Zq{6 — A)^/[A^ (1 — A)^]. In this case, the normalized score process is 
constant and hence the manifold M. consists of a single point. Therefore, kq = 
and £0 = 2 resulting in a distribution of (0.5 Xo + 0.5 Xi), where Xo is a degenerate 
distribution with all its mass at zero. This is the special case derived by Lindsay (1995, 
p. 95). Shapiro (1985) referred to this mixture of chi-square distributions with differing 
degrees of freedom as c/iz-&ar distribution. 

5.2 Mixtures of Exponential Family of Densities 

Suppose that ip{x]0) belongs to an exponential family of densities so that tp{x]0) = 
exp[6^x — ip{0)] ipo{x). The null density is /(■; A) for some A. 

Case of Fixed X: The covariance function becomes 

c{e,e^) = j exp[{e + e^ - xfx + ^{X) -if{e) -^{e^)]iJo{x)dx-i 

= exp[ip{e + 0^-X) + ifiX) - ifiO) - ifiO^)] - 1. 

If ip{-;0) has a multivariate normal distribution with a mean vector and an identity 
variance covariance matrix, it follows that y^{0) = \\6\\'^/2 and 

C{e,6^) =exp[{0 - X,0^ - X)]-l. (5.1) 
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Consider the special case oi d = 1. The critical values are obtained using Theorem 4 
and the one- dimensional volume of Ai has the following explicit expression when A = 0: 

"» - ie lexp(»=) - 1] 

The normalized score process again has the flip property (3.11) and = 4. 

Case of Unknown \: Straightforward calculations show that the covariance function 
(4.3) in Theorem 6 becomes 

a{6, 0t) = c{e, 0t) - [^'(6) - ^'ix)f [^"[x)]-' [^'(0t) _ ^'^x)], 

since C{e\\) = Eg[X - y^'{X)] = ^'{6) - ^\X) and 1(A) = ^"{X). 

In the case of a univariate normal distribution, the volume of the one-dimensional 
manifold becomes 

_ ( [exp{2(g - \f] + 1 - exp{(g - A)^ {2 + (^ - A)^]^^' 

The normalized score process has a singularity at 6' = 6*; however, the precise behavior 
at this point needs careful consideration, which is presented next. In the neighborhood 
of ^, we have 

s{e) = s(e) + {e-e) s'{e) + \{d- of s"(e) + o[{e- ef'^ . (5.2) 

Note that S{6) = S'{6) = (since the latter is simply the score equation defining 
6). By continuity, S"{6) = v^"(A) + o(l); hence, the normalized score process becomes 
^"(A) / a/V[S'"(A)] + 0(1) in the neighborhood of A. This is continuous so there is no flip 
a,t 6 = 9. The manifold Ai for this process is a single segment and io = 2. 



5.3 Testing for m versus (m + q) Component Mixture Model 

One of the important applications of the perturbation theory is in building finite mixture 
models formed from a broad class of smooth densities. First, consider testing 

Tio'. m-component mixture against TYi : (m + l)-component mixture 
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when mixtures are formed from any smooth famihes, including discrete, continuous and 
multivariate densities. Under the mixture model framework, the null model /(■; A) is 



the m component mixture g(x; Qm) = YTjLi (^j i^i^'i ^j)} where Qm = {0^ ,0^Y while 
the alternative is the (m + l)st component. We consider two cases: (1) The support 
point vectors Os are fixed and only the mixing weight vector (3 is estimated and (2) Os 
and (3 are estimated. 

Case 1: Assume is fixed and the goal is to estimate (3. The likelihood surface is 
concave in (3 and the MLEs satisfy 

5(6>J3) = y ^^^^^^ - n = for all j = l,...,m (5.3) 

provided that the solution satisfies < /3j < 1 (otherwise, some components are set to 
zero). The MLE satisfies the conditions of Section 4, provided that (3j > for each j. 
The covariance function is determined based on the result in Theorem 6. 

The set of equations in (5.3) implies that the normalized score process has a singu- 
larity at each Oj. Using an argument similar to (3.11), the process flips at each of these 
points. 

Case 2: The goal is to estimate both 6 and (3. Note that each support point is of 
dimension d. The equations defining the MLEs become 

5(6/,|Q,„) „ = and S'{e^\QJ „ =0 (5.4) 

for all j = 1, . . . ,m. Note that for d > 1, the above equation is a vector. Using an 
expansion similar to (5.2), around each of the true support points, it is easy to verify 
that all the singularities in the normalized score process are removable. 

Consistent estimators of the nuisance parameters are required to apply the results 
of Section 4. This is achieved by imposing an order constraint on the support point 
vectors Oj and a corresponding constraint on the estimators. Under these constraints, 
the approximate critical values are obtained from Theorems 6 and 8. 

Ceneral case: Consider the more general problem of testing Tio : m-component mixture 
against Tii : (m + g)-component mixture for g = 1, 2, . . .. For this case. Theorem 4 is 
still applicable and the score process is easy to derive (see (Pilla and Loader, 2005) for 
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details). Suppose d = 1 and is an interval, then the manifold has two corner points 
and two edges with two boundary faces as shown in Fig. 2. 





Figure 2: Manifold for testing m versus (m + 2) components in mixture models. The 
manifold has two corners, two edges and two boundary faces. 

5.4 Mixtures of Bivariate Normal Distributions 

In this section, we consider the bivariate mixture testing problem so that d = 2, x = 
(xi,X2)^ and = (^1,6*2)'^. To the best of the authors' knowledge, this is the first at- 
tempt at testing for mixtures of multivariate distributions. Assume /(■; A) is a bivariate 
standard normal density and 0) is a bivariate normal density with mean and an 
identity covariance matrix. From equation (5.1), it is easy to verify that the covariance 
function can be exphcitly expressed as C(0, 0^) = exp[^0, 0^)] — 1. Suppose is a disk 
of radius > 0, so that 



In order to address the singularity at \\0\\ = 0, first consider the supremum over Qq < 
II ^11 ^ where < Qq < Qi and next let 0. Under the polar coordinate 



r 



o<|j0||<ei C(0, 0) 




parameterization oi = [gcos{uj), gsm{uj)] 



with the covariance function expressed as 
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C{6, O'^) = exp[g cos{uj — lu"^)] — 1, it follows that 



exp{g^j — 1 g exp{g-^) 
g exp{g^) (1 + g'^) exp{g'^) 
y exp(^^) J 



1/2 



dudg 



27r 



SO 



g^ exp(3^2) -^2(1 + g^) exp(2^2 



1/2 



{exp(f52) _ 1}3 

The integrand has a finite limit as ^) — 0; therefore, the integral is still valid when 
Qo = 0. 

Next, we consider the boundaries at ^ = and g = gi. For an arbitrary g, the 
length of the boundary is 



4 = / [C(6>,6>)]-^ det 
Jo 

Therefore, 

4 = 27r 



exp(r. 






g"^ exp(f)^) 



1/2 



duj = 2'K\ 



g^ exp(f)2) 
[exp(^2) _ 1] 



gl exp(^g) 



gi exp(g^) 
[exp(^g) - 1] ' V [exp(^2) _ 1] 



+ 



27r 



1 + 



exp(gf) 
{exp(^2) _ 



as ^0 ^ 0- The contribution from the inner boundary does not disappear as ^ 0, 
instead it converges to 27r. This implies that the manifold M. corresponding to this 
process has a hole and hA has an Euler-Poincare characteristic of £^ = 0. The tail- 
probability approximation of Theorem 4 simplifies to 



P ( sup Z(6») > c 
.flee 



— F= c exp(— c^/2) + — exp(— c^/2) as n — > oo. 
2V27r ' ' 47r 



The interior hole occurs in any two-parameter problem, as the next lemma demon- 
strates. 



Lemma 1 Suppose 9 is of dimension d = 2 
ip{-;0). The normalized score process S*{6) 
spondingly, the manifold 7W has a hole. The 

271. 



and there exists a A such that f{-',X) = 
has a singularity at 6 = 6q and corre- 
length of the interior boundary of is 
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Figure 3: Manifold for the bivariate normal mixture testing problem. The cylindrical 
manifold has two boundaries: a circle with circumference 27r, corresponding to g = 0, 
and a larger (high dimensional) ring corresponding to g = Qi. 



Proof. A Taylor series expansion yields 

s{e) = {e-eo,s'{eo)) + o{\\e-eo\\) as e^Oo. 

Let R be a matrix such that coy[S'{0o)] = n'RF'R,. Then the normalized score process 
becomes 

As 6 varies in a small circle around 0q, the boundary of the manifold Ai, R(0 — 
0q)/\\'R.{9 — Oo)\\, becomes the unit circle in TZ^ which has length 27r. ■ 

For d = 1, the manifold M has (m + 1) segments so that = 2(m+ 1). Approximate 
critical values are obtained based on Theorem 4 and is evaluated using numerical 
integration. For d = 2, the manifold A4 has m holes with each hole contributing 27r to 
the total length of the boundary Iq. The Euler-Poincare characteristic of Ai is therefore 
(1 — m). For the result in Theorem 4, the constant Kq and the length of the outer 
boundary are found using a bivariate and univariate numerical integrations, respectively. 



5.5 Simulation Experiments 

In order to demonstrate the power of the proposed methods, we present two simulation 
studies and illustrate the process of building mixture models. 

We consider the simulated dataset shown in Fig. 5.5(a), consisting of a sample of size 
n = 100 drawn from the two-component normal mixture model 0.5A^(— 2, 1) + 0.5A^(2, 1). 
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Figure 4: (a) Simulated data from the two-component normal mixture: 0.5A^(— 2, 1) + 
0.5A^(2,1); (b) S*{9) for the one-component mixture model; (c) S*{9) for the two- 
component mixture model; (d) S*{9) for the model with a third component included 
and the first component removed. 

The model building process starts with the first component at the sample mean 9i = 
X = 0.20322. The starting model is obviously a poor fit for the dataset. Fig. 5.5(b) 
presents the fitted normalized score process S*{9), showing two peaks in the vicinity of 
the true mixture components. An application of the volume-of-tube formula in (3.10) 
to this model yields kq = 5.72 and io = 4: with the critical value of c = 2.518 at the 5% 
level. Clearly the peaks are highly significant. A second component at ^2 = —1.68929 
[the location of the larger left peak in Fig. 5.5(b)] is included in the model and the vector 
of estimated mixing weights is /3 = (0.67315, 0.32685)"^. The incorrect first component 
9i still dominates the fitted mixture model. 

The normalized score process relative to the two-component mixture is shown in 
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Fig. 5.5(c). The striking feature of this plot is the two discontinuities at the fitted 
components 6i = 0.203 and 62 = —1.689. These discontinuities occur due to the zeroes 
of the covariance function since C^(^i,^i) = C^(^2,^2) = which in turn corresponds 
to the singularities in S*{6). The manifold Ai for this process has three pieces so that 
kq = 5.082 and = 6. The critical value c = 2.571 and the right peak is still highly 
significant. The maximum occurs at ^3 = 2.07328 which is included as a third component 
in the model. Since f3 = (0, 0.45616, 0.54384)^, the first component is removed from the 
model. For the two-component mixture model with 62 and ^3, the constants Kq = 5.082 
and £q = Q yielding c = 2.571. Fig. 5.5(d) presents the process S*{6) and it is entirely 
below the critical value c; therefore, the two-component mixture model with 62 and 63 
is the final fitted model. 

The true density is chosen as p{x; 77, 9) = 0.5(1 — 77) ip{x; —2) + r]ip{x; 0) + 0.5(1 — 
ri)ilj{x;2) for rj G {0,0.1,0.2} and ip{-',0) is the normal density with mean 6 and unit 
variance. This density has two large with well separated components and our goal is to 
test for the presence of the poorly separated third component. We present simulation 
studies using 1000 data sets under the following three different scenarios: 
Model 1: f{x; A) = g(x; Q) = [O.bip^x; —2) + 0.5ip{x; 2)] is completely specified. 
Model 2: f{x; A) = g(x; Q) = [Pi ip{x] —2) + [32'ip{x] 2)], where [3i and (32 are estimated. 
Model 3: f{x; A) = g(x; Q) = [Pi ip{x] 61) + P2 i'ix; 6*2)], where /3s and ^s are estimated. 

Table 1: Rejection rates for three different null models under three different perturbation 
sizes based on 1000 simulation studies. 







n = 200 






n = 1000 




Model 


7] = 0.0 


7] = 0.1 


r] = 0.2 


r] = 0.0 


7] = 0.05 


7] = 0.1 


1 


79 


537 


975 


74 


636 


990 


2 


78 


583 


985 


76 


673 


992 


3 


74 


292 


588 


61 


371 


817 



Table 1 presents the rejection rates for 1000 simulations under two sample sizes. 
When 7] = 0, Hq is true and hence we expect the rejection rate to be close to the 
nominal significance level of 5%. As rj increases, the power increases as expected. As 
the null assumptions are relaxed, the power decreases which again is to be expected. 
The poor separation between the components makes it difficult for the test to detect 
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the third component which is more prominent for model 3. Naturally, estimating the 
nuisance parameters under the null model has an effect on the power of the test. 

6 Discussion 

In this article, we introduced a general class of models, perturbation models, and pro- 
posed a test statistic (asymptotically equivalent to the LRT statistic) based on the score 
process to detect the presence of perturbation. We derived general inferential theory 
for the asymptotic null distribution of the test statistic for a class of non-regular prob- 
lems using the Hotelling-Weyl-Naiman volume-of-tube formula. The resulting theory 
is extended to solve the long-pending fundamental problem of testing for the mixture 
complexity, including the case when the null model includes a set of nuisance param- 
eters. Our theory is applicable to a general family of mixture models including the 
multivariate family of mixtures. Other applications to the general theory include spatial 
scan analysis, latent class models (employed in social research) and Rasch models (em- 
ployed in educational testing and survey sampling). The inferential theory developed 
in this article provides a solution to an important class of statistical problems involving 
loss of identifiability and/or when some of the parameters are on the boundary of the 
parametric space. 

The explicit determination of the geometric constants appearing in the tube formula 
are carried out using the Libtube software (Loader, 2005). Our theory is general enough 
to be applicable to scalar or vector A and univariate or multivariate data. The advantage 
of our approach is that the tube formula provides an elegant approximation to the 
asymptotic null distribution compared to those based on simulations or bootstrap based 
procedures. 

7 Proofs 

In this section we provide proofs of the main theorems. As before, notation / is used to 
denote derivative with respect to the appropriate term. 
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Proof of Theorem 2. Let 

n 

K{r^, 6) = J2^og 



4 = 1 



^ v{i^{xf,e)-f{xf,\)} 



fixi] A) 

The LRT statistic becomes supg ,j>q -ft'(?7, 0). For any 77 > 0, a Taylor series expansion 
yields 



— T 



i=l 



/n 
i^jxi] 0) 
Jixi] A) 



K{r,l^,e) = K{0,e) + ^K'{0,6) + ^K"{r]*,e) for < r/* < ^ 

^yn In \/n 

n r- . , 

7] 



V 



2n 



E 



i=l 



{^{x,;0)-f{xf,X)}'/{f{xf,X)y 



l + vHHx^■^0)/f{x,■,X)-l} 

Under an implicit assumption that convergence statements are uniform in for bounded 
sets and from the results in Rubin (1956), it follows that 

"{^(x,;0)-/(x,;A)}V{/(x.;A)}2" 



i=l 



l + r]^{ij{x,;e)/f{x,-X)-l} 



is uniformly converging to C{6,6). Therefore, 



^ s{e)-'l-c{0,e) + op{i), 

n 2 



where the Op{l) term is uniform in 77 and on compact sets. In effect, sup^>Q K{ri/y/n, 6) = 
(l/2)max{0,5*(6l)}2 + Op(l). ■ 

On the way to proving Theorem 3, we derive a series of technical results. 

Lemma 2 Let a{9) he a continuously differentiahle function on an interval G. Let 
= [a{6i) — a(^o)]- Then 



J6»n Fl ~ ^0| 



where a' (6) = da{6)/d6. 



Proof Let 9^ = (6*1 - 610) so that 



[a'{e)Yde 



eo 



a'(^)-^ + ^) d9 
2 

a. 
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Note that the first integral is non- negative and the third one is zero. 



Lemma 3 Suppose 6q < 62 and a(^o) = 0(^2) = 0, then 



f [«Wrf^>^^f sup \a{e)\ 
Je 1 — Pol \do<9<e2 



Proof. Suppose the supremum occurs at (6^1, a^) with Oq < 9i < 62- An apphcation of 
Lemma 2 separately over [6*0,^1] and [^1,6*2] yields 



/ [a'{9)fde> f\a'{e)fde>a 



1 1 

+ 



> 



-9o) 



Lemma 4 Suppose h{9) is continuously differentiate. For 5 > 0, let bs{6) be the linear 
interpolant between the points 0, ±5, ±25, .... Then 

sup 
0e& 

Proof. Once again, let be the supremum. An application of Lemma 3 to a{6) = 
[bs{9) - b{9)] yields 

< ^ im - b'{e)f de<^-j^ [{b',{e)Y + {b\e)Y] de 

< 6 [ [b'{e)]^de. 

Je 

The final inequality holds since j'Qlb'sid)]'^ d6 < /@[&'(^)]^; this follows from the applica- 
tion of Lemma 2 between each pair of knots of bs{-). ■ 

Lemma 5 Let Y{0) be a stochastic process with continuously differentiable sample paths 
and let Y5{6) be its linear interpolant between points 0, ±5, .... Then 

pfsup \Ys{e)-Y{e)\>^ < 4e /" [Y\e)fdd. 

\ee& J ^ Jo 

Uniform convergence holds if the expectation is finite: 

limP ( sup \Ys{e) - Y{e)\ > e ) = for all e > 0. (7.1) 
5^0 \0ee / 



bsie) - b{e) <6 [b' 



'de. 
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Proof. From Lemma 4, it follows that 

P (^sup \Ys{e) - Y{0)\ > < P (<^y_ [Y'{0)fde > e^^ 

where the last line follows from the Markov's inequality for any non-negative random 
variable. ■ 

Lemma 6 IfYs{6) converges uniformly to Y{0), as defined in (7.1), then 
limF\ snpYs{6)>c] =F\ supY{6)>c] for any c. 



e / \ee& 



where the right hand side is continuous. 
Proof. For any e > 0, 



sup Ys{e) > c > P sup Y{e) > c + e - P sup \Ys{e) - Y{e)\ > e . 

Consequently, liminf^^o P (sup^gg, > c) > P (sup^g® > c + e). However, 

since e is arbitrary, 

liminf P ( sup Ys{0) > c] >¥ I sup Y{e) > c] . 

By a similar argument, it follows that 



limsup P sup Ys{0) > P sup Y{0) > c 

which completes the proof. ■ 

Proof of Theorem 3. First, convergence of finite-dimensional distributions is a conse- 
quence of the multivariate central limit theorem. Since a linear interpolant is always 
maximized at one of the knots, this implies that the theorem holds for a linear inter- 
polant: 

lim P ( sup S*s{d) > c ) = P ( sup Zs{0) > c 
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for any 6 > 0. For any e > 0, Lemma 5 implies that 

P ( sup > c ) < F ( snp S*s (6) > c- e) +F ( snp IS" (0) - S*s (0)1 > e 

< F(sup^K^)>c-.)+iE/^(^^K^))^. 

\ee@ J J@ 

where Z'g{6) = dZs{0)/d0. The last equality follows from the fact that and have 
the same covariance function. Assumption A4 implies that the expectation is finite. 
From the convergence of finite-dimensional distributions, it follows that 

limsup P ( sup S*{d) > c ) < P ( sup Zs{e) >c-e]+-^E [ [Zs{e)f dd. 

First, let 5 and apply Lemma 6 to Z^. Next, let e ^ to obtain 

limsup P ( sup S^iO) > c ) < P ( sup Z{e) > c 

A similar argument shows that 



n— »oo 



liminf P ( sup S*{9) > c > P sup Z{e) > c 



which completes the proof. ■ 

Proof of Theorem 4- We assume the regularity conditions 1 to 4 in Adler (2000). The 
integral in (3.8) can be expressed as 

Tp f sup > ^) hj{y)dy = P f sup (U,,Cj(0)) > ^) hj{y)dy 

+ Tpf sup > hj{y)dy, 

(7.2) 

where Wq = {1 — rljT) and tq is the critical radius of the tube. The volume-of-tube 
formula given in (A. 4) is exact when y G [c^,c^/tfo] and it is only approximate when 
y G \(? Iwq, cxd). In the former case, from (3.9) 

P f sup (Uj,^^(0)) >^=Y. [i?(rf+l-i)/2,(J-d-l+t)/2 > t^'] . 
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We express the first integral in (7.2) as F{c^) — F{c^ /wq), where 

d /-J /-OO 

^^^) = E / ^ [i?(d+i-t)/2,(j-d-i+t)/2 > i/^'] /^j(y) dy. 

^ ^d+l-t Jx 



Note that the second integral in (7.2) is > providing a lower bound. Furthermore, 

r P f sup (Uj, 1,(0)) > /ij(y) dy < r hj{y)dy = f(x'j>—). 

Therefore, F{c^)-F{^ /w^) < P[sup0g@ Zj{e) > c] < F{^)-F{c^ /wq)+¥ {x^j > c^/wo). 
As c ^ cx), F{c^) - F{c^/wo) ^ F{c^). Therefore, P(sup0g@ Zj{e) > c) ^ F{c^) as 
c — i> oo. By performing the integration in -F(c^), it follows that 

P f sup Zj{0) > = ^ (^^+1-* ^ + ^[^"' ^^P(-cV2)] as c ^ oo. 

When the Karhunen-Loeve expansion is infinite, the above result for the truncated 
Gaussian random field Zj{6) is extended by letting J — > oo as follows. Uniform con- 
vergence of the Karhunen-Loeve expansion implies that Zj{6) — > Z{6) uniformly and 
hence 

P (sup Zj(6») > c) — ^ P ( sup Z(6I) > c ) as J ^ oo. (7.3) 
\ee@ J \ee@ J 

The volume-of-tube formula given in (A. 4) is in terms of (f; however, as J ^ oo 
and for t = 0,...,d, C/ -h> the corresponding geometric term found via p{6,0), 
Therefore the result (3.10) holds. For example, the expression for kq = (q is derived by 
approximating by a series of short line segments to obtain 



det [ViVlp{0, 



1/2 



d0. 



Remark 3: We take sufficiently large J so that the relation (7.3) holds. In practice, it is 
not necessary to employ a truncated covariance function (3.7) that requires specification 
of J and the manifold Ai. Our calculations are carried out in terms of the covariance 
function C{0, 0^). In effect, knowledge of J and the specification of M. does not arise in 
practice. 
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Appendix A: Explicit Expressions for Geometric Constants in (3.10) 

We consider finite Karhunen-Loeve expansion with J terms in deriving tlie geometric 
constants. As a first step, we partition tlie manifold M., correspondingly the tube 
T(r, M) and the parameter space into various boundary regions. First, each point in 
X(r, M) is linked to a point in 7W by a perpendicular projection. Correspondingly, each 
point in M. is linked to a set of points in T(r, A^). Second, partition Ai into regions 
M-Oi . . . , Aid based on the dimension of the linked sets, where M.q represents the main 
part of the manifold and Ali, . . . , Al^ represent boundary regions. For example, when 
d = 1, All corresponds to the two end-points and Alo corresponds to the rest of the tube 
(see Fig. 1). If c? = 2, manifold Al is a polygon so that M.2 represents the corners, Ali the 
edges and AIq the interior. In effect, for a c?- dimensional manifold Al, we can partition 
both T(r, Al) and the space into (rf+l) regions to express 'd{r, Al) = Vo + Vi + - ■ ■ + Vd. 
The main part of the tube can be represented as 

[(1 + Wrf)-'/^ im + Q W T) :6e@, \\t\\ < To] , (Al) 



where Tq = — w'^/w, Q(0) is an orthonormal basis matrix for the normal space at 
^{0). Provided that this transformation is one-to-one, the volume Vq can be expressed as 
Vo = jg det[J(0, r)] dO dr, where J{6, r) is the Jacobian of the representation (A.l). 
The determinant of the Jacobian can be expressed as det[J(0, r)] = P£)(t)(1 + ||tP)~"/^, 
where Pe{T) is a rfth degree polynomial in r with coefficients depending on 6. This 
representation allows the integral defining Vg to be split into its and r components, 
leading to a finite series expansion, for a truncated Zj{6), 

Vo = Vft^t-j -r^ P [B(d+i~t)/2,{j~d~i+t)/2 > w^] , 

where, Kt are the polynomial coefficients integrated over Al for even-order t and the 
partial beta terms arise from integrating the r parts. Odd-order terms integrate to 
by symmetry; therefore, we set = when t is odd. Recall that At = 27r*/^/r(t/2) is 
the (t — l)-dimensional volume of the unit sphere iS^*"^-* in 7^*. The first constant 
is the d-dimensional volume of the manifold Al, represented in terms of the covariance 
function, expressed as 



«:o= / C(6>,6I)- 
J& 



■{d+l)/2 



det 
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de, (A.2) 



where Vi and V2 denote vectors of partial derivative operators with respect to the 
components of and 0^ respectively. The geometric constant K2 is the measure of 
curvature of Ai. 

The process for handling boundary corrections is similar. To compute the main 
boundary corrections, represent the half-tubes around boundaries in a form similar to 
(A.l), with Q(0) supplemented by a vector tangent to Ai but normal to dAi, the 
boundary of A^. The vector r is then restricted to a half-sphere. Following the derivation 
of Weyl (1939), we obtain a series of the form, for truncated Zj{6), 

d~i ^ 

where it terms are the integrals of polynomial coefficients. The first term, is the 
[d — 1) -dimensional volume of dAi which has a form similar to (A. 2), summed over 
each of the boundary faces. It is important to note that odd order terms no longer 
disappear; ii is a measure of rotation of dAi and £2 is a measure of curvature similar to 
K,2- Similarly, at corners where two boundary faces meet, we can represent 

d-2 ^ 

^2 = T ^[Bid~l--t)/2,{J-d+l+t)/2 > w^] , 

where uq measures the rotation angles in the regions of d^Ai (the boundary of dAi) 
where two boundary faces meet and ui is a combination of rotation angles and rotation 
of the edges. Currently, our software library enables computing all the terms given in 
(3.10); effectively yielding a complete implementation of the tube formula up to d = 3. 
To the best of our knowledge, there exist no method for general implementation of 
higher-order terms with boundary corrections. 

Remark J^: When d = 2, the fourth order coefficients are £2 = = f^o = 0. Additionally, 
the Euler-Poincare characteristic (Knowles and Siegmund, 1989) satisfies k,2 + £1 + '^0 = 
2ti£ — kq eliminating the need to compute K2, £1 and uq directly. The Euler-Poincare 
characteristic is the number of pieces making up the manifold, minus the number of 
holes. When is a compact as well as a convex set and C{0, 6) > for all then 
S = l. 

Combining the above results together, the tube formula, up to fourth order terms. 
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can be expressed as 



= K,0 



d+1 



A, 



-P [B, 



(d-l)/2,(J-d-l)/2 



2nAd-i 
Aj 

+ (^2 + Jyi+ mo) —— P [5(rf„2)/2,{J-d-2)/2 > W^] 



(A.3) 



where rriQ measures the size of wedges at corners where three boundary faces of A4 meet. 
After completing evaluation of all terms leads to a series, 



J Aj 



t=o 



A 



-P [B. 



{d+l~t)/2,{J-d-l+t)/2 



(A.4) 



The dominant term can be expressed as 



Co 



d_ 

de 



m 



dO 



det [ViVlpj{e,0 



1/2 



(A.5) 



where Vi and V2 are partial derivative operators with respect to the first and second 
arguments of pj(-, ■), respectively. 

The following correspondence (up to t = 3) holds: Co = ^^O) Ci = ^o/2, C2 = (^^2 + ^1 + 
vq) I (27r) and (^3 = (£2 + z^i + vti^^ j (47r). The tube formula is exact for tubes with radius 
r < ro, the critical radius. 
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